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Abstract 

One aspect of fault- tolerance in process control programs is the 
ability to tolerate sensor failure. This paper presents a methodology 
for transforming a process control program that cannot tolerate sensor 
failures into one that can. Issues addressed include modifying spec- 
ifications in order to accommodate uncertainty in sensor values and 
averaging sensor values in a fault-tolerant manner. In addition, a hi- 
erarchy of sensor failure models is identified, and both the attainable 
accuracy and the run-time complexity of sensor averaging with respect 
to this hierarchy is discussed. 

Keywords: fault- tolerance, process control systems, real-time dis- 
tributed systems. 


1 Introduction 

A process control program communicates and synchronizes with a physi- 
cal process. Typically, the program reads values from the physical process 
through sensors and writes values through actuators, as shown schematically 
in Figure 1. This paper is concerned with tolerating failures of continuous- 
valued sensors. 

The approach developed in this paper is outlined as follows: 

'This work was supported by the Defense Advanced Research Projects Agency (DoD) 
under NASA Ames grant number NAG 2-593, Contract N00140-87-C-8904. The views, 
opinions, and findings contained in this report are those of the authors and should not be 
construed as an official Department of Defense position, policy, or decision. 
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sensor 


Figure 1: A process- control program 

1. A specification of the control program is written in terms of the state 
variables of the physical system. For example, the specification of a 
program controlling a chemical reaction vessel would refer to a variable 
T whose value is assumed to be the temperature of the vessel. 

2. Each physical state variable referenced by the specification is replaced 
with a reference to an abstract sensor. An abstract sensor is a set 
of values that contains the physical variable of interest. Uncertainty 
in sensor values now becomes an issue, and the specification must be 
re-examined and possibly changed to accommodate it. 

3. The control program is written based on the specification produced by 
Step 2. This program reads abstract sensors that are assumed to al- 
ways contain the correct value of the corresponding physical variables. 

4. For each abstract sensor referenced by the program written in Step 3, 
a set of abstract sensors that fail independently are constructed. Each 
abstract sensor is implemented using a concrete sensor, which is a 
physical device that “reads” a physical variable 1 , such as a thermo- 
meter. This step will require some knowledge of the physical process 
being controlled as well as the specification of the concrete sensor. 

5. A fault-tolerant averaging algorithm is used with these replicated ab- 
stract sensor values in order to calculate another abstract sensor that 

l The concrete sensor need not sense the exact physical state variable of interest. For 
example, an abstract temperature sensor could be constructed from a pressure gauge by 
using Boyle’s law: PV = nRT. 
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is correct even if some of the original sensors are incorrect. The av- 
eraging algorithm assumes that no more than / out of the n abstract 
sensors are incorrect, where / is a parameter. The relation between n 
and / depends on the way sensors can fail. 

The resulting system will have a structure like that shown in Figure 2. 

The rest of the paper is organized as follows. In Section 2, we define a 


abstract concrete 

sensor sensor 



Figure 2: Replicated sensors 

method of representing sensors that makes them amenable to replication and 
discuss the effect of uncertainty on process control program specifications. 
In Section 3, we discuss sensor failure models and present a sensor averaging 
algorithm. Section 4 contains a demonstration of our methodology. 

2 Physical State Variables and Concrete Sensors 

A variable in a computer is quite different from a state variable in a physical 
process. A computer variable takes on values from a finite domain, and 
cam assume only a bounded number of values in any finite time period. 
A physical state variable, however, may take on amy real value at arbitrary 
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times. A convenient way to represent a physical state variable in a computer 
program is as a function. The domain of such a function is typically time , but 
it can be some other physical variable, depending on the safety properties 
of interest. 

A concrete sensor is a device that can be used to sample a physical state 
variable. For example, a computer controlling a reaction vessel might have a 
thermometer as a concrete sensor. A concrete sensor may interact with the 
computer in a variety of way: the computer may poll the sensor, the sensor 
may asynchronously alert the computer when a certain value is sensed, or 
the sensor may send a stream of values to the computer where each value 
indicates that the physical variable has changed by a certain amount. We 
will assume that a concrete sensor cr has a specification $ (T , and will call this 
sensor faulty if it exhibits a behavior not consistent with its specification. 

For example, consider a thermometer whose value is read by polling. 
Suppose this concrete sensor returns a value T with an accuracy of e de- 
grees and the computer obtains the sensor’s value within 6 seconds of the 
thermometer being sampled. If the time the computer program receives T 
is t, then the specification of this thermometer is: 

$(T,t) = 3*o : t - S < t 0 < t : T - e/2 < T(t 0 ) <f + e/2 

A concrete sensor is not very convenient mechanism. For example, with 
the thermometer: 

• The sensor has a limited accuracy. Network delay and processor 
scheduling further limit the accuracy of the sensor. 

• The control program may be interested in a temperature at a time the 
thermometer was not sampled. A value must then be interpolated; 
doing so requires knowledge of the physical process being monitored. 

• Some properties of the concrete sensor, while important to the imple- 
mentation, should be irrelevant to the specification used by the process 
control program. For example, another thermometer might generate 
an interrupt if the temperature rises above 100 degrees. This is an im- 
portant property of the sensor-it allows for an accurate determination 
of when 100 degrees is reached. There may be other ways to make the 
same kind of precise measurement, however, for a sensor that is polled. 
It would be convenient if the control program could be the same for 
any method of measurement, as long as the measurement is accurate 
enough. 
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We will address these difficulties in two ways. The first problem cannot 
be eliminated, so in Section 2.2 the effect of inaccuracy in specifications is 
addressed. The other two problems, interpolation and data abstraction, are 
addressed here by abstract sensors. 

2.1 Abstract Sensors 

An abstract sensor is a piecewise continuous function from a physical state 
variable to a dense interval of real numbers. We will denote an abstract 
sensor with an overbar over the variable, such as 7(f). When possible, we 
will simply write T if we are interested in the “current” value; that is, the 
sensor value for the current value of t. Intuitively, interval T represents the 
possible values of T, given the imprecision of the concrete sensor used to 
compute T and any uncertainty in the physical process. _ 

An abstract sensor T(t) can be represented as a pair of functions T mm (t) 
and T ma x{t), allowing T(t) to be the interval [T m in( 0 •• T’mox(t)]. The 
accuracy of an abstract sensor is the width of the interval, or \T(t)\^ With 
this representation, min T(t) = T m i n {t)i max T(t ) = T ma x(f), mid |T(t)| — 

T max{t) — T m in(t) • 

An abstract sensor T is correct if it is not too inaccurate and always 
includes the value of the actual physical variable. More precisely, for some 
upper bound accj on the accuracy of T , 

T correct over D = f 

Vt € D : min T(t) < T(t) < max T{t) A |T(f)| < accj 

We assume that a failure of an abstract sensor can arise when the un- 
derlying concrete sensor fails. As will be discussed in Section 3, a hierarchy 
of failure classes can be defined: 

• fail-stop failures (following [17]), in which a failed abstract sensor can 
be detected 2 ; 

• arbitrary failures with bounded inaccuracy 3 , in which either |T(f)| < 

J The value of a failed fail-stop sensor can be defined to be the empty interval whose 
value is [e .. e - 1] for some value of e. The empty interval has the convenient properties 
that it contains no points and intersects no interval, including itself. 

3 We use the term bounded inaccuracy to refer to bounding from above the accuracy of 
an abstract sensor. Similarly, an abstract sensor is too inaccurate if the numeric value of 
its accuracy is too large. 
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accj is always true or accj is known, and thus abstract sensors that 
are too inaccurate can be detected; 

• arbitrary failures, in which an abstract sensor can fail arbitrarily. 

Given a concrete sensor, it may not be easy to implement an abstract 
sensor. In general, it may require considerable knowledge about the physical 
process being monitored. For example, consider the specification $(T,f) for 
the polled thermometer. The specification, alone, is not sufficient informa- 
tion to define an abstract sensor T, since we don’t know how to interpolate 
values between successive sensor readings. Suppose, however, we know from 
the physical process being monitored that |^| < At* This bound on the 
change of T allows us to interpolate intermediate values with a known ac- 
curacy. The abstract sensor T(t) can be defined as 

f -e/2- A T {t-t + 6) < T(t) < f + €f2 + A T (t-t + 6) for t > t 

One can use this example as a recipe for writing abstract sensors, but the 
resulting sensor may be too inaccurate for any practical use. For example, 
if |^| can be bound more tightly at certain known times, a more accurate 
sensor can be constructed. In Section 4, the development of an abstract 
sensor is shown in some detail. 

2.2 Abstract Sensors in Specifications 

The specification of a system typically includes a set of safety conditions: 
predicates on the state of the system that the implementation must ensure 
are always true. A safety condition on a process control program will refer- 
ence physical state variables. For example, consider a reaction vessel with 
a pressure relief valve. One safety condition might be that whenever the 
pressure p is greater than some ceiling p max , the valve must be open. We 
could write this safety condition as p > p max = open, where open is a state 
function that is true when the valve is open. 

The specification of a process control program will have to be changed 
when expressed in terms of abstract sensors. It is not possible to take a con- 
trol program written in terms of physical state variables and, for each ref- 
erence to such a variable, substitute a reference to a corresponding abstract 
sensor. Consider p > p max s open. The condition that results from replac- 
ing the physical state variable with an abstract sensor is p > p max = open; 
one must decide what the term p > p max means. 
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Let S be a predicate on the system state and V be the set of physical 
variables mentioned in 5 that will be accessed through abstract sensors. We 
need another condition S' that contains no references to any r, 6 V but 
may instead contain references to ttj. The only constraint on S' is that it 
reduces to 5 when the abstract sensors have perfect accuracy 4 : 

(S' A |tii| = 0) => S 

Vi 

There are several ways such an S' can be constructed. We could replace 
all references to t>; in 5 with references to the midpoint oft;,. However, if all 
values in TJ, have the same likelihood of being valid, then there are only two 
reasonable alternatives. We can either require that all points in v, satisfy 5 
or that there exists at least one point in t>, that satisfies 5. More precisely, 
for each physical variable v, the condition 5 can be generalized as 

S' *=* Vn, € Vi : S or S' *= f 3u, £vi'.S 

The generalization of 5 cannot be done automatically, since it is really a 
refinement of the problem specification. Ideally, one would like to strengthen 
5 so that states excluded by the safety condition are still excluded. For 
example, we might want to assert that a catalyst is injected (denoted by 
the state function C) only when the pressure is above a minimum value: 
C => (p > Pmin )• In tins case, the state we are trying to avoid is one 
where the catalyst is injected at too low a pressure, and we can strengthen 

C => (p > Pmin) to C => (Vp € p : p > Pmin)- 

We may find, however, that a specification cannot be strengthened in a 
meaningful way. The property p > p m ax = open is an example. Changing 
the property to (Vp 6 p : p > Pmax ) = open will allow states with p > p ma x 
and -i open, and changing the specification to (3p € p : p > Pmax) = open 
will allow states with p < p mo x and open. Unless we can guarantee that 
|p| = 0, the program’s specification must be changed. Here, we are probably 
more interested in avoiding an explosion of the vessel. If so, the condition 
we want is (3p €p:p> p m ax ) = open, and we would accept the fact that 
the pressure valve may be unnecessarily open. 

It shouldn’t be surprising that, in some cases, a property of a specifica- 
tion must be changed (as compared to being strengthened) when references 

4 The expression 5 ** is 5 with all occurrences of physical state variable v, changed to 
abstract sensor v,. 
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to physical state variables are replaced with references to abstract sensors. 
Using abstract sensors exposes uncertainty in the physical process’ state and 
a specification may have been written implicitly assuming no such uncer- 
tainty. Of course, specifications are sometimes written with such uncertainty 
explicitly mentioned. For example, an informal expression of the pressure 
relief valve property might be “if the pressure rises to within 0.1 millibars of 
Pmax then the relief valve must open”. In our notation, this property would 
be expressed as ((3 p £p:p> p ma x) = open) A (|p| < 0.1). 

3 Fault-Tolerant Abstract Sensors 

Given n independent abstract sensors and some assumptions about failures, 
we would like to construct an abstract sensor that is tolerant of failures. We 
will first present an algorithm that constructs a sensor containing the correct 
value given that no more than / of the original sensors are not correct. We 
will then consider how this algorithm performs with different failure models. 

3.1 Fault-Tolerant Sensor Averaging 

Let T, and T } (* ^ j) be two abstract sensors for the same physical value 
T. If T% and T } both contain the correct value, then the intervals T, and T } 
must intersect, and their intersection must contain the (unknown) value T. 

If / or less sensors do not contain the correct value, then any ( n - /)- 
clique , or set of n — / mutually intersecting sensors may contain the correct 
value, since they each share a common value. Conversely, any point not 
contained in at least n - f intervals cannot be the correct value; if it were, 
then there would be more than / sensors that do not contain the correct 
value. So, the cover of all (n — /)-cliques must contain the correct value. 
This gives us an abstract sensor averaging algorithm. 

Algorithm 1 Fault-tolerant Sensor Averaging 

Let 5 be a set of values taken from n abstract sensors, and sup- 
pose the abstract sensors are of the same physical state variable 
where their values were read at the same point in their domain 
(e.g. at the same time). Assuming that at most / of these sen- 
sors are incorrect, calculate n /,„(£) which is the smallest interval 
that is guaranteed to contain the correct physical value. 
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Implementation: Let / be the smallest value contained in at least 
n - / of the intervals in 5 and h be the largest value contained in 
at least n - / of the intervals in 5 (by assumption, these values 
must exist). Let ri/, n (5) be the interval / .. h}. 

Algorithm 1 is inexpensive-it can be implemented in 0(n log n) time. 
Appendix B gives an implementation that has this running time. 

The accuracy of H/ in (5) depends on the value of /, as illustrated in 
Figure 3. In this example, the value of n 0i „(<S) is the empty interval because 
it is impossible for both intervals a and b to contain the correct value; at 
least one of them must be incorrect. In general and when defined, n 0i n(<S) is 
the intersection of the intervals in <5, n„_i in (5) is the cover of the intervals 
in S, and | n /in (S)| < | C\p, n (<S)| if / < /'• 


b 


c 


d 


n i,5 

n 2,5 

<^ 3,5 

n 4,5 


Figure 3: Intersection with / = 1,2,3 and 4 

One consequence of the definition of C\f tn (S) is that for / > 0, fl/ )n (5) 
can contain values that cannot be the correct value. For example, Figure 4 
shows the intersection of three intervals a, b and c. If / = 1 then the correct 
value must be within II or 12. Algorithm 1 , however, would calculate the 
interval I. The points between 71 and 72 are added to preserve the “shape” 
of the abstract sensor as seen by the control program. 

It is instructive to compare n/ >n (5) with n-modular redundancy [20] 
(nmr). In NMR, n independently produced values of a variable are presented 
to a voter that selects the majority value as its output. By doing so, the 
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voter can mask up to / incorrect inputs where n > 2/ + 1. The function 
n /,n(<5) resembles an NMR voter, except that it accepts intervals rather than 
points as inputs and it produces the most accurate value possible as output 
for any value of / : (0 < / < n). If the inputs to n/ iFl (5) are point intervals 
(that is, have a width of zero), then the NMR voter and fiy n (5) produce the 
same output when n > 2/+1. 


a 1 — 


-» 


■*" 


c 


b 


II r*-*n 12 

I ► 


Figure 4: Intersection with n — 3 and / = 1 

The relation of / to n (and hence the accuracy of n /„($)) depends on 
the failure model that is assumed. We will first assume arbitrary failures 
(both with and without bounded inaccuracy) and then consider a fail-stop 
failure model. We assume that no more than / of the n sensors can be faulty 
and that once failed, a sensor remains failed. 

3.2 Arbitrary Failures 

The width of an interval that is an abstract sensor value determines the 
sensor’s accuracy. If the ratio f/n of the number of faulty to non-faulty 
abstract sensors is too large, then one cannot bound the inaccuracy of the 
resulting abstract sensor. The following theorem bounds f/n. Define the 
functions min, and max, to be the i th smallest and largest values of a set 
of n values respectively. Note that min, is the same as max n _, +1 . For 
example, if S = {13, 14, 15} then min 3 (S) = maxi(S) = 15. 

Theorem 1 If f < ^ en l n /.n(<^)l < miii 2 /+i{|>»| : s € S}. 

The proof of this theorem is in Appendix A. 

If / > L(n + 1)/2J then the derived interval can be more inaccurate than 
any sensor in the system. Theorem 4 in Appendix A formally states this 
property. An example is shown in Figure 5. Suppose the three sensors a, b 
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and c are “maliciously” faulty. They can make n^ jn (5) as inaccurate as 
desired by choosing appropriately distant values from intervals d and e. 



Figure 5: Intersection with n — 5 and / = 3 

One property of ri/, n (S) is that, depending on the values of S, 0/ >n (5) 
can be more accurate than any sensor in S. Figure 6 illustrates this property. 
Such a value of S can result from different delays, errors, or other sources 
of uncertainty that arise in computing the value of the abstract sensors 
comprising S. This property makes replication of abstract sensors attractive 
not only for tolerating failures, but also for increasing the expected accuracy 
of a sensor’s value. 


d 


e 



Figure 6: Intersection with n = 5 and / = 1 

If n = 2/ + 1, however, then the accuracy of n />n (S) is limited, in that 
it cannot be more accurate than the most accurate sensor in S. This is 
illustrated in Figure 7 where / = 1 and n = 3; here, the only way we could 
change sensor c so that it contains values outside of Hy n (5) would be to 
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make c more inaccurate than either a or 6 or to make c detectably faulty (as 
discussed in Section 3.3). It is, therefore, advantageous to have n > 2/ + 1 for 
a system with arbitrary failures. Theorems 5 and 6 in Appendix A formally 
state this property. 


a 


I 


c 


b 


Figure 7: Intersection with n = 3 and / = 1 

Theorem 1 bounds the accuracy of a derived abstract sensor in terms 
of the accuracy of one of the abstract sensors used in its construction. 
Such a bound is useful only if s, is not faulty — in particular, if |s,| < accj . 
Hence, Theorem 1 only applies for arbitrary failures with bounded inac- 
curacy. However, if this bounding sensor could have an erroneously large 
inaccuracy, then the bound is not meaningful. Consider the sensors shown 
in Figure 8. If sensor c is erroneously inaccurate, then the value of n 1(3 (5) 
is as inaccurate as c. Thus, the ratio f/n of the number of faulty to non- 
faulty abstract sensors must be smaller than that stated in Theorem 1 when 
sensors can have unbounded inaccuracy. Theorem 2 gives this bound on 
f/n. 


a 


~ b 

*{ c 

i 
i 

<■< I 


Figure 8: Intersection with n = 3 and / = 1 
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Theorem 2 LetC be the (unknown) subset of S that are correct. Iff < [§J 
then | n f '„ (5)| < min /+ i{|s| : s G C}. 

The proof of this theorem is simple: from Theorem 1, 

|n/,„(5)| < maxn-j /{V-s 6 5} 

For |n/ t „(5)| to be bounded by a correct sensor, n-2f > f and so n > 3/. 
The worst case is when / faulty sensors are the most inaccurate, so 

|D/,„(S)| < min/+i{M : s € C} 


□ 

Under the hypothesis of Theorem 2, a minimum of four sensors are nec- 
essary to tolerate a single faulty sensor. Figure 9 illustrates this case — even 
if sensor d has an erroneously large inaccuracy, | n li4 (5)| is bounded by a 
nonfaulty sensor. 



Figure 9: Intersection with n = 4 and / = 1 


3.3 Other Issues on Failure 

If f < f sensors can be detected as failed then they can be removed from 
5, and n and / can be reduced by f before computing D/ in (5). By doing 
so, the ratio //n will be decreased, thereby improving the bound on the 
inaccuracy of D/ )n (5). In a fail-stop failure model, all sensor failures are 
detectable, meaning that up to n - 1 failures can be tolerated and n/ „(5) 
will be as accurate as the most accurate nonfaulty sensor. Additionally, the 
running time of Algorithm 1 with fail-stop sensors is O(n). 
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We can use Algorithm 1 to detect some failed abstract sensors assuming 
an arbitrary failure model. This algorithm is very simple: any sensor in 
S that does not intersect n^ n («S) cannot contain the correct value, and is 
therefore incorrect. 

Algorithm 2 Detecting failed sensors. 

Given n sensors S and a maximum number of faulty sensors /, 
find a subset of the sensors DCS that are incorrect. 

Implementation: Compute 0/ in (5) using Algorithm 1. Then, 
v = {s:seSAsn (n /irl ($)) = 0}. 

It is likely that Algorithm 2 will fail to detect some of the incorrect 
sensors. For example, using Algorithm 2 with the sensors in Figure 4 yields 
V = 0; even though we know that only one of the two sensors a, c must be 
incorrect, we cannot tell which of the two is incorrect. 

So far, we have assumed that once a sensor fails it remains failed. This 
assumption may not be realistic for sensors, since an abstract sensor main- 
tains no state. It seems natural to assume a sensor may occasionally fail in 
an apparently malicious way and then “heal” itself and subsequently yield 
correct values. So, a natural extension to the arbitrary failure model is to de- 
note the faulty sensors at time t as a function F{t) such that Vf : \F(t)\ < /. 
Unfortunately, we cannot construct a correct abstract sensor under these 
conditions; the averager might be unlucky and each time read a (temporar- 
ily) incorrect abstract sensor. We must also guarantee that there exists a 
period II such that the number of failures in all time intervals of length II 
is bounded: 


3n > 0 : Vt,i' : t < t' < t + II : | UJF(t')| < /. 

If Algorithm 1 obtains values from each concrete sensor within II time units 
then it constructs a correct abstract sensor. In the limit of large II, this 
model reduces to the earlier arbitrary failure model. 

4 Example 

The methodology presented in this paper requires some thought to use. 
An original specification may have to be changed to accommodate abstract 
sensors, and it may be difficult to construct a set of independent abstract 
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sensors. In this section, we show an example of how a specification can be 
converted from one that uses physical state variables to one using abstract 
sensors. We also show how an abstract sensor can be implemented from a 
concrete sensor. 

As part of the Cornell Real-Time Reliable Distributed Systems (RR) 
project, we are deriving correct process control programs from specifications. 
One of the problems we have chosen is that of a train traversing a sequence of 
n adjacent track segments of possibly unequal lengths. Assume that segment 
i spans track locations c, through c,+i where (Vi : 0 < i < n : c, < c, + i). A 
train has position x(t) and velocity v(t), has zero length 5 , starts at position 
c 0 = 0 and moves in the direction of increasing x (towards c t ). Each track 
segment has an associated minimum and maximum speed mini and max , ; if 
the train exceeds these limits, it may derail. Additionally, there is a random 
communications delay associated with all messages in the system that is 
bounded by 6 seconds. 

A track circuit &(q,r) is a concrete sensor associated with a span of track 
q < x < r. A nonfaulty track circuit returns true iff the train occupies any 
part of the circuit’s span at the time the circuit is polled. We will assume 
that there are M track circuits. 

The safety condition for correct operation of the train is that it not 
derail, or 

5 = f Vt, t : 1 < * < n : c, < z(t) < c i+ i => min, < v(t) < max, 

S is expressed in terms of physical variables, so it must be changed to 
be expressed in terms of abstract sensors. The obvious condition is 

S' d = V«, i : 1 < i < n : 

(3z G x(t) :ci < x < c i+1 ) => (Vt> € v(t) : mini < v < max,) 

since this also excludes all unsafe states (at a penalty of running the train 
conservatively ) . 

Since the condition S' refers to the abstract sensors x and U, the control 
program will need to refer to these sensors. We will show how an abstract 
position sensor x, can be constructed from the track circuits The 

simplest way to do this is to assume a bound on the velocity of the train 
v < v max . Define the global array of M elements: 

‘In Appendix C we show th&t controlling a train of length i > 0 ii equivalent to 
controlling a train of zero length. 
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var train[i] : {before, in, after} := before, before; 

Define a polling process for each track circuit Note the delay is 

represented by a delay statement; the implementation must ensure that no 
more than A seconds elapse between successive polls of a sensor where A is 
small enough so that the polling process does not “miss” the train traversing 
the track segment it is monitoring: A < (r — q — Assertion I 

is a loop invariant, and t is the current time. 

process Pollfi] = 
begin 

{I : train[i] = before => 0 < x(t) < q + 6v max A 
train[i] = in => q < x(t) < r + Sv max A 
trainfij = after =>■ r < x(t) < c n } 

do true — ♦ 

delay A; 

if <7( ?i r )A (trainfij = before) -» train[i] := in 
[] — A (train[i] = in) — ► train[i] := after 
0 -1<7 (?,r) A (train[i) = before) -*• skip 
0 <J(q, r) A (trainfi] = in) -* skip 
(trainfij = after) —* skip 
fl 

od; 

end 

The definition of the abstract sensor comes from the loop invariant I 
and the distance the train could have moved since the last time <J( q , r ) was 
read: 


x, = if trainfij = before -* [0 .. q + (S + A)u mox ] 

| trainfij = in -+ [q .. r + (£ + A)w max j 
J trainfij = after — ► fr .. c n j 

fl 

Fault- tolerance is achieved by constructing an abstract position sensor from 
each track circuit and then using Algorithm 1. Additional fault- tolerance 
could be achieved by replicating the track circuit for each track circuit. 

The abstract sensor developed here is too simplistic to be of any real use. 
Correct track circuits far away from the train give very inaccurate bounds 
on the train’s location, and by Theorem 2 the accuracy of the fault-tolerant 
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abstract sensor will be poor for any reasonable /. In the actual system, 
we make use of an abstract sensor x 0 (t) whose value is derived from the 
initial condition x(0) = 0 and from the commands sent to the train. We 
call this abstract sensor a model sensor since if it is incorrect, then either 
the control program is faulty or the specification of the environment was 
incorrect. The model sensor is initially very accurate, and can be used 
to detect some of the failures of the abstract sensors x*. Having a model 
sensor also simplifies the computation of the other abstract sensors. The 
train has the property that if an abstract sensor x; is computed from a 
fixed set of track circuit polls and the commands sent to the train, then 
the interval [x,(t).mtn — xo (t).min .. x,(t).max — xo(f)- max l * s a constant. 
So, the implementation of x, computes an accurate value of [x,(t).min - 
xo (t).min .. x,(t).max - xo(t)- max ] at the time * it notes the track circuit 
first coming on, and computes for t' > t as [x(t').mino + Xi(t).min — 

x 0 (t).min .. x(t').max 0 + Xi(t).max-xo{t).max]. The implementation of x, 
can do the same computation when the track circuit subsequently goes off, 
and if the two resulting values of the abstract sensor do not intersect then 
the abstract sensor is faulty. 

For our program, it is necessary to ensure that that |x(t)| < acc x where 
acc x is length of the shortest track segment. Given a value of A, one can 
estimate the accuracy of abstract sensors near the train, as these will be 
the most accurate. The abstract sensors x, have a known bound on their 
accuracy, so Theorem 1 can be used to find the maximum value of / that 
will guarantee |x(f)| < acc x . 

5 Discussion 

This paper presents a five-step process, through which a program written in 
terms of physical state variables can be transformed into one that reads the 
physical state variable through a set of concrete sensors, some of which may 
be faulty. The degree of sensor replication depends on the failure model be- 
ing assumed. Figure 10 summarizes the maximum number of faulty sensors 
that can be tolerated for the three failure models considered in this paper, 
assuming that an unboundedly accurate sensor is desired. 

The work presented here is part of the general problem of input reifica- 
tion [9], The results in this paper are a generalization of the work done by 
the author and presented in [15,14]. This earlier work looked at the problem 
of clock synchronization in a distributed system. A clock is a special kind of 
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Failure Model 

fmax 

min n: f = 1 

min n: f — 2 

arbitrary failures, 
unbounded inaccuracy 

l(n - 1)/3J 

4 

9 

arbitrary failures, 
bounded inaccuracy 

L(n - 2)/2j 

4 

6 

fail-stop failures 

n - 1 

2 

3 


Figure 10: Maximum failures for different error models 


sensor, in that the physical process it senses can be expressed simply. 

The approach presented in Section 2.2 concerning transforming specifi- 
cations is novel. Much work has been done on expressing and determining 
the validity of properties that refer to real time (for example, [7,19]), but 
usually these specifications are typically written in terms of physical state 
variables where, for each variable, an a priori upper bound on its accuracy 
is known. 

The methodology presented in this paper is related to the state machine 
approach [18,10]. A set of sensors of the same physical value can be thought 
of as a set of identical processors that return intervals rather than scalar 
values. In both cases, failures are masked by replication and voting. 

Studies on hierarchies of failure models (for example, [2,16]) originally 
arose in the context of the agreement problem [5]; a problem not addressed 
here. If the control program were to be replicated, then the processes of 
this program would need to use an agreement protocol to disseminate the 
sensor’s values [3,4,8]. There has been work on agreement on the value 
of sensors. For example, the inexact agreement problem discussed in [13] 
relates the accuracy of the agreement value with respect to the number of 
rounds the protocol executes. A different approach to agreement among 
sensors is taken in [12], in which sensor failure is not considered. 

The methodology presented here is incomplete. For example, there are 
other kinds of sensors than those considered here; for example, discrete 
sensors like one denoting whether or not a door is open, or multivalued 
sensors like one that returns the altitude and azimuth of an airplane. We 
are extending the material in this paper to accommodate these more general 
sensors. 
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A Proofs 

The four theorems in this appendix give upper and lower bounds of 
We need the following two definitions: 

Definition 1 If S is a set of intervals, a c-clique of S is a subset S' of S 
where |$'| = c and all the intervals in S' mutually intersect. 

Definition 2 A set of intervals is c-reduced if each interval in S is a mem- 
ber of a c-clique. 

Note that a graph is (n - /)-reduced if and only if Algorithm 2 computes 
the empty set. 

The upper and lower bounds of |n/ t „(<S)| are as follows. Theorem 1 is 
the same as Theorem 3; it is repeated here for clarity: 

Theorem 3 Let S be a set consisting of n intervals. IfO < f < |_(n + l)/2j 
and Df in (S) 0, then |ri/ in (S)| < min 2 /+i{|l| :s € <S}. 

Theorem 4 Given a set {/ lt t 2 , .. ofn lengths and n > f > [(n+l)/2j, 
then for any length A > min{fi,f 2 , tfiere exists a set °f n interva ^ 
S = {5i,52, where Vi : » < * < n : |3,| = f, and |n/ >n (5)| = A. 

Theorem 5 LetS be a (n- f)-reduced set ofn intervals. Ifn > f > [n/2j 
and n /in (5) ^ 0, then |n/,„(5)| > max 2 („_/)_i{|s| :3 6 5}. 

Theorem 6 Given a set {< 1 ,^ 2 , of n lengths, an arbitrarily small 

length e, and 0 < / < |n/2j , there exists a(n- f) -reduced set of n intervals 
S = {Ji,S 2 , — ,3n} where Vi : * < i < n : |li| = f, and |n/, n (5)| = e. 

Theorem 4 can be shown by construction. Let S consist of the following 
two cliques: 
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• C i containing n — f intervals, where each interval in this clique has 
a minimum value of u, and by definition of A, a maximum value no 
larger than u + A; 

• C 2 containing / intervals, where each interval in this clique has a 
maximum value of u + A, and by definition of A, a minimum value no 
smaller than u. 

By hypothesis, [n/2] = |_( n + 1)/2J < / < n, or 2/ > 2fn/2] > n 
and so / > n — / meaning both cliques are contained in fl/ if ,(5). So, 
n/ n (5) = [it .. u + A] and the theorem follows. □ 

Theorem 6 can also be shown by construction. Let S consist of two 
cliques: 

• Ci containing [n/2j intervals such that [(n — /)/2j intervals have a 
maximum value of u+e and the remaining [n/ 2j - [(n- /)/ 2j intervals 
have a maximum value less than u; 

• C 2 containing [n/ 2] intervals such that [(n — /)/ 2] intervals have a 
minimum value of u and the remaining [n/2] — ["(n — /)/ 2] intervals 
have a minimum value greater than u + e. 

By hypothesis, 0 < / < |_n/2j or \n/2\ < \n/ 2] < n - f < n, and so 
neither Ci nor C 2 are entirely in n /, n (<5)- However, n — / intervals intersect 
over the interval [u .. u + e], and the theorem follows. □ 

To prove theorems 3 and 5, we will need a few lemmas. 

Lemma 1 Let S be a set of n intervals where S contains at least one c- 
clique and all c -cliques in S have exactly i intervals in common with each 
other. Then, n > c > i and n > 2c - i. 

Proof: since S contains at least one c-clique, we know n > c. Fur- 
thermore, since all c-cliques in S have exactly i intervals in common, each 
c-clique must have at least i intervals, or c > i. 

If c = i, then the smallest graph satisfying our assumptions is a single 
i-clique, or n = i = 2c - i. If c > *, then S must contain more than one 
c-clique, for otherwise the single c-clique has c > t intervals in common with 
itself. The smallest such set of intervals consists of two c-cliques sharing i 
intervals. Each clique has c — * intervals not in common with each other, or 
n = i + 2(c - »') = 2c - i. □ 
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Lemma 2 Let S be a set of n intervals where S contains at least one c- 
clique. If n < 2c, then all c-cliques in S have at least 2c — n intervals in 
common with each other. 

Proof: by contradiction. Suppose that all the c-cliques in 5 have exactly 
i' intervals in common with each other, where t' < 2c — n. By lemma 1, 
S contains at least 2c - i 1 intervals, or n > 2c - i'. Rearranging the last 
inequality, we get i' > 2c - n, which contradicts our hypothesis. □ 

Lemma 3 Lets £ S be any member of all maximal cliques of S. The cover 
of the intersection of the maximal cliques is no larger than |s| . 

Proof: The intersection of any maximal clique cannot contain any point 
outside of s, since by definition that point is not in an intersection containing 
s and s is a member of each clique. The cover only adds points between the 
intersections. Since 5 is a set of intervals over the reals, s must contain all 
points between the maximal cliques, so the cover does not add any points in 
s. Since all the points in the cover are also in s, the cover cannot be larger 
than |3|. □ 

Theorem 3 can now be shown. From the definition of fi/ in (S), the 
maximal clique in S must contain at least ti — f intervals, for otherwise 
n /, n (£) = 0- By assumption, / < |_(n + 1)/2J or n < 2(n — /). By lemma 2, 
at least n — 2/ intervals intersect all cliques. By lemma 3 the cover of the 
intersection cannot be larger than any of these n — 2/ intervals. The cover, 
however, may be larger than any of the remaining 2/ intervals. In the worst 
case, these remaining intervals are the smallest ones in S , and the theorem 
follows. □ 

Lemma 4 Let S be a c-reduced set of n intervals, and let the intervals s, 
in S be ordered such that min ?, < min lj if i < j. Then, the intervals 
Sj,52, . . .J c form a c-clique. 

Proof: by induction. The lemma is trivially true for c — 1 since any 
interval is by itself a 1-clique. So, we assume the lemma holds for c = k 
and show that it holds for c = k + 1. Let 5 be a (H l)-reduced set 
of intervals. If a set is ( k + l)-reduced then it is fc-reduced, so by the 
induction hypothesis the intervals si,S 2 , . . .st form a fc-clique. If sjt+i does 
not intersect some interval si : 1 < i < k, then all intervals s } : j > k + 1 
also do not intersect s,, and so s, is not a member of a (k + l)-clique. This 
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contradicts our assumption that S is (k + l)-reduced, and so Sk+i must 
intersect each interval 3j, s 2 , • • .sit, and the lemma holds. □ 

The same argument can be used to prove the following lemma: 

Lemma 5 Let S be a c -reduced set of n intervals, and let the intervals I, 
in <5 be ordered such that maxi, > max Sj if i < j. Then, the intervals 
s\,S 2 ,---s c form a c-clique. 

Lemma 0 If S is a (n - f) -reduced set of n intervals, then 

fl /,„(£) — [min n _/ +1 {min s : s € 5} .. max„_j + i {max s:sg5}] 

Proof: this lemma follows directly from Lemma 4, Lemma 5 and the 
definition of n/ <n (5). □ 

Theorem 5 can now be shown. From Lemma 6, all intervals intersect 
n^„(«S) and there are exactly 2 (n — /— 1) (not necessarily distinct) intervals 
that extend outside of fl / t „(5). This means that there are at least n - 2(n - 
/ — 1) = 2(/+ 1) — n intervals that are completely contained by n/ „(5). So, 
|n/,„(S)| > min 2{/+1) _„{s : s G 5} or |n /iW (5)| > max^.^.ifs : s 6 5}. 
□ 

B Algorithms for Computing H/ in (»S) 

This section contains some algorithms for computing fl j n (S). A set of ab- 
stract sensors are isomorphic to a class of graphs called interval graphs, 
which in turn are members of the class of triangulated graphs. Such graphs 
axe interesting in that many problems, such as coloring, clique, stable set 
and clique cover can be solved for triangulated graphs in polynomial time. 
A good reference on triangulated graphs is [6], which includes efficient algo- 
rithms that solve the above problems. 

The value of n^ n (5) is [/ .. h] where / is the smallest point contained in 
n - / intervals and h is the largest point contained in n - / intervals, and 
where a point x is contained in an interval s if and only if min s < x < 
max ?. Suppose that there are a intervals s in S such that min s < x and 
that there are b intervals s' in 5 such that max s' < x. Any interval not 
counted in a cannot contain x, and the intervals counted in b are those that 
were counted in a but cannot contain x, so x is contained in exactly a — b 
intervals. 


22 



Let v be an array of 2 n pairs where for each Si E 5, = (min s,, 1) 

and i? 2 »+i = (max s,-, -1). Given a point i, 

a = Y u '[ 2 i 

Vi:ui[l]<x 
Au,[2] = l 

and 

» = - E *’■121 

V»:v,[l]<x 

AV|[2]= — 1 

or 

number of s E S containing x = a — b = ^ v,[2] + ^ t\[2] 

W:t;dl]<x V, :Vl [l]=x 
Ao»[2] = l 

Computing the number of intervals in S that contain x can be made 
linear if v is sorted. Define v, < Vj — (v«[ 1] < Vj[l]) V (u,[l] = ^ [l] A v* [2] > 
Vj[2]), and let t/ be u sorted with respect to <. Then, 

max j:uj[l]<xA 
(uj[l]=x)^(t;j[2]=l) 

number of s E S containing x = ^ v,[2] (1) 

i=0 

Recall that l is the smallest point contained in n - / intervals. Thus, / is 
the smallest x that makes Equation 1 equal to n - /, which is v' (ow [i\ where 

i 

low - min j : ^ u-[2] = (n - /) 

i=0 

Similarly, h is the largest point contained in n - / intervals, which is the 
largest x that makes Equation 1 equal to n — /. This point is also the 
maximum value of some interval such that all points greater than x are 
contained in no more than n — / — 1 intervals, or h is V'highi 1 ] Where 

i 

high = max j : Y V .'[ 2 1 = ( n ~ f ~ l ) 

i=0 


23 



Both low and high can be computed from v ' in 0(n) time, and v f can be 
computed from v in 0(n log n) time, so the overall running time is 0( n log n). 

There are two cases for which Oj^iS) can be calculated faster than 
0(n log n): 

1- n n _ ltn (5) is the cover of «S, or l is the smallest minimum value of the 
intervals and h is the largest maximum value of the intervals. For our 
purposes, however, this case is not very interesting. 

2. If all of the intervals in S mutually intersect, then all of the minimum 
values of these intervals are less than or equal to the smallest maximum 
value of these intervals (this can be tested for in 0(n) time). Under this 
condition, the array v* consists of all of the minimum values (having 
Ui[2] = 1) followed by the maximum values (having v,-[2] = -1), Thus, 
l is the f + l 9t largest minimum and h is the /+ 1** smallest maximum, 
both which can be calculated in 0(n ) time [1], If / = 0 or a fail- 
stop failure model is assumed, then we are interested in the value of 
Ho ,n(£)> which requires that all intervals mutually intersect and can 
be calculated trivially in 0(n) time. 


C Train Length 

In the example of Section 4, we assumed the train had zero length. This 
is not an unreasonable assumption, since we can show that for every train 
of length I on a track FT, there exists a track K f such that a zero-length 
train is constrained in exactly the same way as the original train on A'. In 
this section, we show how to determine the track K* from L and IC. The 
method is an example of transforming to configuration space [11]. 

A track K is defined by three sets (Vi : 1 < i < n : {c,}, {mini}, {max,}) 
where c, is the location of the end of track segment i, min, is the minimum 
allowable speed on segment i and max, is the maximum allowable speed on 
segment i. If the train has length L and the tail of the train is at x, the 
safety condition is that all parts of the train satisfy the speed constraints, 
or 




Vx', i : x < x' < x + X, 1 < t < n : c< < x 1 < c; + j => min, < w < max, 
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Suppose we could find a track K* : (Vi : 1 < i < n' : {c'}, {min'}, {max-} ) 
such that 

S L { x, u) = (5 0 (x, t>) d = Vi : 1 < i < ri : c- < x < c' +1 => min' < v < max 1 ,) 

So is the safety condition for a zero-length train on track K' which is 
constrained in exactly the same way an A-length train on the original track 
is constrained. If we can find A' then we can write a program that controls 
a zero-length train on K\ and this program will also control the Z-length 
train on K . 

Define the two functions 


Min( L, x) = Vj : x < c } < x + L : max min } 

Max( L, x) = f Vj : x < Cj < x + L : min max } 

These functions determine the actual speed bounds the train must follow 
when at X. With them, S L can be rewritten as Sl ■ Min(I,x) < v < 
Ma x(L, x). 

We can now find the values of K' that allow S L (x,v) to be rewritten as 
5 0 (x, u). Both Min(I, x) and Max(I, x) are piecewise constant functions, so 
we can define the track segments of K' to be the spans where both Min(I,x) 
and Max(£,x) are constant. Let c\ be the union of the points of inflection 
of Min(X, x) and Max(A, x), and let 

min' = lim Min(X,c, f 6) 

max( = lim Max(X,c, + ^) 

< 5 — 

Figure 11 shows an example of K' given K and L. Each track segment is 
drawn with the maximum speed above the segment and the minimum speed 
below the segment. Note K' is shorter than K by L , since the end of the 
tr ain cannot traverse the whole length of K without the train leaving K. 
Here, K and K' have the same number of segments; in general, K' can have 
up to twice as many segments as K . 
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Figure 11: Configuration Space 
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